AUTOMATED CODE GENERATION FOR DISCONTINUOUS 
GALERKIN METHODS 
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Abstract. A compiler approach for generating low- level computer code from high-level input 
for discontinuous Galerkin finite element forms is presented. The input language mirrors conven- 
tional mathematical notation, and the compiler generates efficient code in a standard programming 
language. This facilitates the rapid generation of efficient code for general equations in varying 
spatial dimensions. Key concepts underlying the compiler approach and the automated generation 
of computer code are elaborated. The approach is demonstrated for a range of common problems, 
including the Poisson, biharmonic, advection— diffusion and Stokes equations. 
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1. Introduction. Discontinuous Galerkin methods in space have emerged as 
a generalisation of finite element methods for solving a range of partial differential 
equations. While historically used for first-order hyperbolic equations, discontinuous 
Galerkin methods are now applied to a range of hyperbolic, parabolic and elliptic 
problems. In addition to the usual integration over cell volumes that characterises 
the conventional finite element method, discontinuous Galerkin methods also involve 
the integration of flux terms over interior facets. Discontinuous Galerkin methods 
exist in many variants, and are generally distinguished by the form of the flux on 
facets. A sample of fluxes for elliptic problems can be found in [4]. 

We present here a compiler approach for generating computer code for discontin- 
uous Galerkin forms. From a high-level input language which resembles conventional 
mathematical notation, low-level computer code is generated automatically. The gen- 
erated code is called by an assembler to construct global sparse tensors, commonly 
known as the 'stiffness matrix' and the 'load vector'. The compiler approach affords 
a number of interesting possibilities. It permits the rapid prototyping and testing of 
new methods, as well as providing scope for producing optimised code. The latter can 
be achieved through the compiler by precomputing various terms which are tradition- 
ally evaluated at run time, and by deploying procedures for analysing the structure 
of forms which facilitates various a priori optimisations which may not be tractable 
when developing computer code in a conventional fashion. In addition, the represen- 
tations of element tensors (element stiffness matrices) for a given variational form are 
not limited to the usual quadrature-loop approach. For many forms, computation- 
ally more efficient representations can be employed. In essence, the form compiler 
approach allows a high level of generality, while competing in terms of performance 
with specialised, dedicated code, as will be elaborated in this work. 

The use of a form compiler is particularly attractive for mixed problems, where 
one may wish to work with a combination of continuous and discontinuous function 
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spaces, and function spaces which differ on element interiors. Such problems become 
trivial in the context of a compiler, as the compiler can automatically generate a 
degrce-of-freedom mapping, thereby alleviating a difficulty faced when using many 
legacy codes for mixed problems. It also bears emphasis that the compiler provides 
the necessary operators to generate code not only for discontinuous Galcrkin methods, 
but also for a range of novel finite element methods that draw upon discontinuous 
Galcrkin methods. These methods may not involve discontinuous function spaces but 
do involve integration over interior facets. Such examples can be found in [8, 22, 16]. 

The concepts presented in this work are implemented in the FEniCS Form Com- 
piler (henceforth FFC). FFC is a component of the FEniCS project [2.3], which consists 
of a suite of tools which aim to automate computational mathematical modelling, and 
all components are released under a GNU public license. FFC is freely available at 
http://www.fenicsproject.org and will generate code for all examples presented 
in this work. 

The rest of this work is arranged as follows. Section 2 considers aspects of the 
assembly of variational forms and the representation of finite clement tensors. This is 
followed by key concepts for the assembly of discontinuous Galcrkin forms in Section 3. 
We discuss the form compiler FFC and the performance of the code it generates in 
Section 4 and a number of examples are presented in Section 5. 

2. Compiling and assembling finite element variational forms. In this 
section, we outline a general framework for compiling and assembling variational 
forms. We then extend the framework to discontinuous Galerkin methods in the fol- 
lowing section, where the form compiler must also consider integrals of discontinuous 
integrands over interior facets 1 of the computational mesh. 

2.1. Multilinear forms. Consider a general multilinear form, 

a : V£ x V* x • • • x V£ -> K, (2.1) 

defined on the product space x V£ x • ■ • x Vf t of a sequence {V^}^ =1 of discrete 
function spaces on a triangulation T of a domain f2 C M d . 

Multilinear forms appear as the basic building blocks of finite element discreti- 
sations of partial differential equations. The canonical example is the standard vari- 
ational formulation of Poisson's equation —Am = /. It reads: find Uh 6 Vh such 
that 

a{v,u h )=L{v) VveV hl (2.2) 

where Vh = V h x and Vh = V^ is a pair of discrete finite element function spaces (the 
test and trial spaces). The bilinear form a is here given by 

a(v,u) = / WvVudx (2.3) 
Jn 

and the linear form L is given by 

L(v) = { vfdx, (2.4) 
Jn 

For this problem, r — 2 for the bilinear form a and r = 1 for the linear form L, but 
forms of higher arity also appear (see [18]). 

1 A facet is a topological entity of a computational mesh of dimension D — 1 (codimension 1) 
where D is the topological dimension of the cells of the computational mesh. Thus for a triangular 
mesh, the facets are the edges and for a tctrahedral mesh, the facets are the faces. 
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2.2. Finite element assembly. With {4>i}f = i a global basis for Vh = Vh, one 

may obtain the solution Uh = X/-»=i ^i^i °f the variational problem (2.2) by solving a 
linear system AU = b for the degrees of freedom U of the discrete solution Uh, where 
Aij = J^\7(f>i ■ \7(f>jdx and hi = J n 4>ifdx. In general, we are concerned with the 
discretisation of the general multilinear form a of (2.1), that is, the computation of 
the (sparse) rank r tensor A obtained by applying the multilinear form to the basis 
functions {<^} i= fi of for j = 1, 2, . . . ,r: 

A i = a(ti 1 ,<fi 2 ,...,<tr ir ), (2.5) 

where i = (ii, i<i y . . . , i r ) is a multi-index. 

The tensor A may be computed efficiently by an algorithm known as assembly [24, 
7, 17]. This algorithm computes the tensor A by iterating over the elements of the 
triangulation T = {K} and adding the local contribution from each local element K 
to the global tensor A. We refer to the local contribution from each element as the 
element tensor A K [18]. For r = 2, this is normally referred to as the 'element stiffness 
matrix'. With {(j> i ' J }"l 1 a local basis for V£ on clement K, the clement tensor A K is 
given by 

Af =a K (^\^ 2 ,...,^f), (2.6) 

where q,k is the local contribution to the multilinear form from element K. In the case 
of the bilinear form for Poisson's equation, this contribution is given by ok(v, u) = 
J K Vu ■ Vu dx. 

2.3. Tensor representation. In [11, 15, 12, 13], it was demonstrated that by 
generating low-level code from a special tensor representation of the element ten- 
sor A K , one may generate (compile) very efficient code for assembly of the corre- 
sponding global matrix A. We return to the issue of the efficiency of the generated 
code in Section 4.3. 

We demonstrate here how to derive this tensor representation for a basic example 
below and refer to [13] for a general representation theorem. Consider the bilinear 
form for the weighted Laplacian —V • (wVu) = f, 

a(v, u) — / w\7v-Vudx. (2-7) 
Jo. 

The corresponding element tensor A K is given by 

Af = f wVtfc -tf 2 dx. (2.8) 

JK 

For simplicity, we consider here the case where = = Vh- Now, let Fk ■ Kq — > K 
be the standard affine mapping from a reference element Kq to any given element 
K G T. Using a change of variables from the reference coordinates X to the real 
coordinates x = Fk(X), we find that 

*?-tt± ^ F >«-t a -tr a -tr L •-fr-tr-'*' (2 - 9) 

«i=i aj =i a ,=i 0=1 OXfi 0X0 Jk ° OAai OA " ! 

where <i> denotes basis functions on the reference element, n is the number of degrees 
of freedom for the local basis of w, d is the dimension of the domain Q, and we 
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have expanded w in the local nodal basis of Vt- By defining the two tensors A° = 
Ik (p a 3 §Y^§Y^dX and G K = dct F' K w a3 Yfp=i li^li^ we ma y cx P rcss thc 
element tensor A K as the tensor contraction 

Af = Y,A° ia G a K . (2.10) 

a 

We refer to A as the reference tensor and to Gk as the geometry tensor. 

Thc main point of this representation is that the reference tensor A is indepen- 
dent of the triangulation T and may thus be precomputcd. During assembly, one 
may then iterate over all elements of the triangulation and on each element K com- 
pute the geometry tensor Gk, compute the tensor contraction (2.10) and then add 
the resulting element tensor A K to the global sparse matrix A. The form compiler 
FFC automatically generates the tensor representation (2.10); i.e., it precomputes the 
reference tensor A at compile-time and generates code for evaluating thc geometry 
tensor Gk and the tensor contraction. 

3. Extending the framework to discontinuous Galerkin methods. To 

extend the above framework for finite element assembly to discontinuous Galerkin 
methods, we need to consider variational forms expressed as integrals over the interior 
facets of a finite element mesh. Consider for example thc following bilinear form which 
may appear as a term in a discontinuous Galerkin formulation: 

a(v,u)= J2 I MM ds, (3.1) 

SediT s 

where dfT denotes the set of all interior facets of the triangulation T and where [v] 
denotes the jump in the function value of v across the facet S: 

H =«+-«". (3.2) 

Here, v + and v~ denote the values of v on the facet S as seen from the two cells K + 
and K~ incident with S, respectively (see Figure 3.1). We note that each interior 
facet is incident to exactly two cells, and we may label these two cells K + and K~ , 

3.1. A general assembly algorithm. To assemble the global sparse tensor A 
for variational forms that contain integrals over interior facets as in (3.1), one may 
extend the standard assembly algorithm over the cells of the computational mesh by 
including an iteration over the interior facets of the mesh. Similarly, one may extend 
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the assembly to include the exterior facets (the set of facets incident with dfl) to 
account for terms that involve integrals over the boundary of the mesh. 

A general assembly algorithm for the computation of the global sparse tensor A 
is given in [1]. This algorithm iterates first over all the cells of the mesh to compute 
the local contribution from each cell to the global sparse tensor. Above, we referred 
to this contribution as the element tensor A K . In the context of the general assembly 
algorithm we shall refer to the contribution from each cell as the cell tensor A . 
Similarly, one may iterate over the exterior and interior facets of the mesh and add 
the local contribution from each facet to the global sparse tensor. We refer to these 
local contributions, denoted by A s , as the exterior or interior facet tensors. 

3.2. Computing the interior facet tensor. To define the interior facet ten- 
sor A for a given multilinear form a expressed as an integral over the set of interior 
facets di^l such as in (3.1), we write 

= a(c/> n ,0 l2 ) = ^as(<&i,<fc 2 )) (3.3) 
s 

where the summation is carried out only over those interior facets where both <f>^ and 
4>i 2 are nonzero. In the case of (3.1), we have as(v, u) = J s [vl[[uJ ds. To assemble the 
global sparse tensor A efficiently, one may introduce a local-to- global mapping that 
maps the basis functions on the local facet S to the set of global basis functions. To 
construct this mapping, consider again two cells K + and K~ sharing a common facet 
S as in Figure 3.1. As above, let {4>i}iLi be a global (possibly discontinuous) basis for 
Vh- For ease of notation, we consider the assembly of the global tensor (matrix) for a 
bilinear form for V h x = V h 2 — Vh and drop the index j (see equation (2.5)). We thus 
assume here that all discrctising function spaces are equal, but note that this is not 
necessary. In particular, our implementation in FFC docs not make this assumption 
and is able to generate code for assembly of tensors of arbitrary ranks for arbitrary 
combinations of finite element function spaces. 

Furthermore, let {0f be the local finite element basis on K + and let simi- 
larly {(j>f }" =1 be the local finite element basis on K~ . We now extend these local 
basis functions to the macro cell K = K + U K~ by the following construction: 

!<Af + (x), i = l,2,...,n, xeK+, 

0, i = l,2,...,n, xeK-, 

0, i = n + l,n + 2,...,2n, xeK+, [ ' 

<f)fS n (x), i = n + l,n + 2,...,2n, xeK~. 

We thus extend the local basis functions on K + and K~ to K by zero to obtain a 
local finite element space on K of dimension 2n. For each K E T, we further let 

i K :[l,n}^[l,N] (3.5) 

be a standard local-to- global mapping, that is, a mapping from a local enumeration 
of the basis functions on cell K to the corresponding global basis functions such that 
4>f = 4> UK (i)\K for i = 1,2, ... ,n. By the construction (3.4), we obtain a local-to- 
global mapping for K (or S). Thus, t^-(l) = lk+ (1)> • ■ • , L ii( n ) = l k+ (n), Lii(n + 1) = 
L K -(l),...,L R (2n) = i K -{n). 

We may now proceed to define the interior facet tensor A s . Consider first the 
case when l r is an injective mapping and note that i R is injective when the ranges of 
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l k + and l k - are disjoint (which is the case for discontinuous elements). Continuing 
from (3.3), we then obtain 

Al = = (^f_ 1(ii)) ^f_ 1(i2) ) =E^),^)' ( 3 - 6 ) 

where the interior facet tensor A s is thus defined by 

A? = as(4%,4g), ii.»2 = 1,2,..., 2n. (3.7) 
It follows that the global tensor A may be computed by Algorithm 1. 

Algorithm 1 Assembly algorithm over interior facets. 
A = 

for S 6 9iT 

for ii, 12 = 1, 2, . . . , 2n 

end for 

for ii, %2 = 1, 2, . . . , In 

end for 
end for 



Now, if i^- is not injective (two local basis functions are restrictions of the same 
global basis function), which may happen if the basis functions are continuous, we 
may still assemble the global tensor A by Algorithm 1 and compute the interior facet 
tensor as in (3.7). To see this, assume that t^-(ii) = t-K^'i) = I f° r some i\ ^ i[. It 
then follows that the entry A i tiR (i 2 ) will be a sum of the two terms Af ± i2 and Af, ^ 
(and possibly other terms). Since as is bilinear, we have 

<i 2 = a s$nM) + "s$$M) = asi^ + ^jfj = asifa^l), (3.8) 

where by the construction (3.4) (f>j is the global basis function that both ^ and (f>p 
are mapped to. 

3.3. Tensor representation and precomputation on facets. In Section 2, 
we described how the cell tensor (element tensor) may be computed from the tensor 
representation 

A?=Y, A ^ G k- ( 3 - 9 ) 

a 

Similarly, one may use the affine mappings F K + and F K - to obtain a tensor repre- 
sentation for the interior facet tensor A s . However, depending on the topology of 
the macro cell K, one obtains different tensor representations. For a triangular mesh, 
each cell has three facets (edges) and there are thus 3x3 = 9 different topologies to 
consider; there are nine different ways in which two edges can meet. Similarly, for 
a tetrahedral mesh, there are 4 x 4 = 16 different topologies to consider. 2 We thus 

2 FFC assumes a particular local ordering of the entities of the computational mesh as described 
in [19]. If no particular ordering of the mesh entities is assumed, one needs to consider 3 X 3 X 2 = 18 
different topologies for triangles and 4 X 4 X 6 = 96 topologies for tetrahedra. This is because there 
are two different ways to superimpose two edges, and there are six ways to superimpose two faces. 
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obtain a tensor representation of the form 

A? = Y,^ +iShriS)G hsy (3-10) 

a 

where / + and /~ denote the local numbers of the two facets that meet at S relative 
to the two cells K + and K~ respectively. We note that the geometry tensor G^- 
in (3.9) involves the mapping from the reference cell and differs from the geometry 
tensor in (3.10), which may involve the mapping from the reference cell and the 

mapping from the reference facet. The form compiler FFC precomputes the reference 
tensor A '? ^ for each facet-facet combination (,f + ,,f~) and a run-time decision 
must be made as to which reference tensor should be contracted with the geometry 
tensor. 

4. Implementation. We give here a short introduction to the FFC language 
before putting the compiler into context with respect to the other components of 
FEniCS. Then we discuss the performance in terms of the efficiency of the generated 
code as well as the benefits of automated code generation in general. 

4.1. The Form Compiler FFC. The form compiler FFC computes the tensor 
representation (2.10) from a high-level description of the variational form, and gen- 
erates efficient C++ code for the computation of the element tensor based on this 
representation. It is also possible to generate code that uses the conventional quadra- 
ture representation. Code can be generated which is consistent with the UFC [1] 
specification, although any format in any language can be implemented. 

The bilinear form for the weighted Laplacian (2.7) can be expressed in the FFC 
form language by a = w*dot (grad(v) , grad(u) ) *dx. Integration over a cell is de- 
noted by *dx, integration over an exterior facet is denoted by *ds and integration over 
an interior facet is denoted by *dS. The FFC form language is equipped with basic 
differential operators including partial derivatives, v.dx(i); the gradient, grad(v); 
the divergence, div(v); and the curl, curl(v). Basic linear algebra operators like 
inner products dot (v, w) and matrix- vector multiplications mult (A, v) are also im- 
plemented. Functions which are evaluated on facets can be 'restricted' to the plus 
or minus sides of the facet. A function v evaluated on the plus side of a facet is 
expressed as v ( ' + ' ) , and the same function evaluated on the minus side is expressed 
as v('-'). Typical discontinuous Galerkin operators are available such as jump(v), 
which is equivalent to v + — v~; jump(v, n) , which is equivalent to v + n + + v~n~ or 
v + ■ n + + v~ ■ n~; and avg(v) , which is equivalent to (v + + v~) /2. 

FFC is written in Python, and the interface to FFC also uses the Python syntax 
which makes the addition of user-defined operators simple. This will be demonstrated 
in Section 5, as will be the use of the operators introduced above. 

4.2. FEniCS. The form compiler FFC is a core component of FEniCS [23], a 
software system aiming at automation of various aspects of computational mathemat- 
ical modelling, in particular the solution of partial differential equations. Other core 
components of FEniCS include FIAT [9, 10], SyFi [3, 21], UFC [2] and DOLFIN [20]. 

FIAT is a tool for generating and tabulating finite element basis functions for 
a range of finite element spaces. FFC calls FIAT at compile-time to evaluate basis 
functions on the reference element as described in Section 3.3. FFC supports the gen- 
eration of C++ code which is consistent with the Unified Form-assembly Code (UFC) 
specification. Any library which supports the UFC interface can use the automatically 
generated code to assemble forms. 
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Finally, DOLFIN is a consistent high-level problem-solving environment (PSE) 
for the solution of partial differential equations. DOLFIN handles the communication 
between the core components of FEniCS. Amongst other things, DOLFIN manages 
meshes (it does not generate meshes) and provides various linear algebra solvers and 
tools, as well as an interface to specialised linear algebra libraries. DOLFIN supports 
the UFC interface and uses the code generated by FFC to assemble and solve the 
global linear systems. 

4.3. Performance. As claimed above, the form compiler FFC generates effi- 
cient code that may compete with and in some cases outperforms hand-written and 
optimised code. This was demonstrated in [12] where speedups as large as a fac- 
tor 1000 were demonstrated for a set of standard test cases. 3 These speedups are 
a consequence of the reduced complexity of computing the tensor contraction (2.10) 
compared to a run-time iteration over quadrature points for certain forms. It was fur- 
ther demonstrated in [11, 15, 14] that these results may be further improved by finding 
a priori so-called complexity-reducing relations between subtensors of the reference 
tensor A . 

As an example, consider here the operation count for evaluating the tensor- 
contraction for the Laplacian, 

With n the dimension of the local finite element function space, each of the n 2 entries 
of the element tensor A K may be evaluated in roughly Tt ~ d 2 (d + 1) + d 2 ~ d 3 
operations; first d+1 operations to evaluate each entry of the d x d geometry tensor 
Gk arid then d 2 operations to perform the tensor contraction. 

If instead we use a loop over N quadrature points at run-time, each entry of A K 
may be evaluated in CN operations, where C is the cost of evaluating the integrand at 
each quadrature point. It is difficult to estimate exactly the size of C, but a reasonable 
estimate is 2d 2 (transforming the gradient of both the test and trial functions from 
the reference element by a matrix-vector product with the inverse transpose of the 
Jacobian matrix) . It is assumed that the values of the gradients of all basis functions 
have been pretabulated at the quadrature points on the reference element. Now, the 
number of quadrature points needed for exact integration is N ~ (2(fc — 1) + l) d = 
(2k — l) d , where k is the polynomial degree of the finite element basis functions. It 
follows that the complexity of quadrature is Tq ~ CN ~ 2d 2 (2k — 1) . We thus find 
that the speedup of the tensor contraction (2.10) compared to quadrature is 

|^2^M = 2(2 ,_ im (42) 

It follows that the speedup may be substantial for moderate sized values of k. For 
k = 1, it is not clear that the speedup is positive, although it was demonstrated in [12] 
that the speedup in this case is a factor ~ 10. 

3 It should be noted that these speedups concern the evaluation of the element tensor on each 
local element. Insertion into the global sparse matrix and solution of the linear system are not 
included. The global speedup is smaller and depends on the efficiency of linear algebra and mesh 
data structures, as well as the problem at hand, the efficiency of the linear solver, and choice of 
prcconditioner . 
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If instead we consider the weighted Laplacian (2.7), it is less clear that the tensor 
contraction outperforms quadrature. As discussed in [12], the relative performance 
of quadrature improves with an increasing number of coefficient functions (weights) 
in the variational form. Because of this, the form compiler FFC supports a mode 
where quadrature code is generated instead of code based on the tensor representa- 
tion (2.10). Even with quadrature, various simple a prior optimisations are possible 
when using prccomputation, such as the elimination of operations on zero entries and 
loop unrolling. 

In summary, the advantage of code generation is not that the tensor representa- 
tion (2.10) always leads to more efficient code than hand- written code, in particular 
since (2.10) may (with some effort) also be coded manually. Instead, the merit lies 
in: (i) the potential to employ sophisticated strategies for evaluation of the element 
tensor as in [12]; (ii) the potential to employ sophisticated compile-time optimisation 
techniques as in [11, 15, 14]; (iii) the generation of architecture-specific code (in par- 
ticular for multicore processors) ; and (iv) the reduction in development time through 
simple and compact coding of finite element variational forms while retaining effi- 
ciency. In particular, the simplicity with which forms can be coded is demonstrated 
in the following section. For all examples in the following section, the time required for 
FFC to generate C++ code is of the order of seconds. The quantity of automatically 
produced code is highly dependent on the complexity of the considered form. 

5. Examples. We demonstrate the compilation of variational forms for discon- 
tinuous Galerkin methods through a number of examples. 

5.1. Poisson's equation. Consider the function space Vh, 

V h = {v G L 2 (SI) : v\ K G P k (K) VK G T} , (5.1) 

where Pk (K) denotes the space of polynomials of degree k on the element K. Set- 
ting V h = V 2 = Vh, the bilinear and linear forms for the Poisson equation with 
homogeneous Dirichlet boundary conditions read [4] 

a(v,u) = / Vv ■ Vu dx- / {vj ■ (Vu) ds - (Vv) ■ {uj ds 
Jn Jr° Jr° 

- I M • Vu ds - I Vv ■ [u] ds+ I • M ds+ I %u ds (5.2) 

Jon Jan Jr° fi Jan h 

and 

L(v) = [ vf dx, (5.3) 
Jn 

where T° denotes interior facets, a is a penalty parameter and ft, is a measure for the 
average of the mesh size defined as h = (h + + hr)j2 for the two cells K + and K~ 
incident with the given interior facet. The size of a cell is defined here as twice the 
circumradius. The jump [•] and average (•) operators are defined as \v\ = v + n + + 
v~n~ and (Vu) = (Vv + + Vv~)/2 on T and [u] = vn on dSl. Here, n + and n 
denote the outward unit normal to the given facet as seen from the two cells K + 
and K~ respectively. The corresponding FFC input for this problem is shown in 
Table 5.1 for 5th order polynomials on triangular elements. The form and syntax of 
the compiler input resembles closely the mathematical notation in (5.2) and (5.3). 
Note that in the code we need to restrict the (potentially) multi-valued function h 
to either K + or K~ (here h ( ' + ' ) ) even if the function in this particular case is 
single-valued (h = (h+ + h~)/2). 
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Table 5.1 

FFC input for the interior penalty method applied to the Poisson equation using k = 5. 



element = FiniteElementO'Discontinuous Lagrange", "triangle", 5) 



v = TestFunction(element) 
u = TrialFunction(element) 
f = Function(element) 

n = FacetNormalC'trlangle") 
h = MeshSize( "triangle") 



alpha =32.0 

a = dot(grad(v) , grad(u) ) *dx \ 

- dot(jump(v, n) , avg(grad(u) ) ) *dS \ 

- dot (avg(grad(v) ) , jump(u, n))*dS \ 

- dot (mult (v , n) , grad(u))*ds \ 

- dot(grad(v), mult(u, n))*ds \ 

+ alpha/h( '+' ) *dot (jump(v, n) , jump(u, n))*dS \ 
+ alpha/h*v*u*ds 



L = v*f*dx 



5.2. Advection diffusion equation. We consider now the advection-diffusion 
equation with homogeneous Dirichlet boundary conditions on inflow boundaries and 
full upwinding of the advective flux at element facets. Setting V } ] = = Vh, where 
Vh is defined as in (5.1), the bilinear and linear forms read 

a(v, u) = Vv • (kVu - bu) dx + / {vj ■ bu* ds + {vj ■ bu* ds 
Jn Jr° Jan 



k[u](Vm) ds- k(Vw) • [u] ds- nfv] ■ Vu ds 
r° Jr" Jan 

-f nVv-lujds+f ^[t;].[«J ds+ / Y« d « ( 5 - 4 ) 
Jan Jr° ' l Jan " 

and 

L(v) = f vf dx, (5.5) 



Jn 

where the vector b is a given velocity field, u* is equal to u restricted to the upwind 
side of a facet, 

'•"I- ' (5.6) 

and k is the diffusion coefficient. The definitions of the jump and average operators are 
the same as for the Poisson equation. The FFC input for this problem is depicted in 
Table 5.2, and is again a reflection of the mathematical formulation. In this particular 
implementation, the same basis has been used for the solution and components of 
the advective velocity field, although different orders can be used. The value of the 
discontinuous function 'of (outflow facet) in Table 5.2 is either 1 or on a side of 
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Table 5.2 

FFC input for the advection-diffusion equation using k = 3. 



scalar = FiniteElement ( "Discontinuous Lagrange", "triangle", 3) 

vector = VectorElement ("Lagrange" , "triangle", 3) 

constant = FiniteElement ("Discontinuous Lagrange", "triangle", 0) 

v = TestFunction(scalar) 
u = TrialFunction(scalar) 

b = Function(vector) 

f = Function(scalar) 

n = FacetNormalC'triangle") 

h = MeshSize( "triangle") 

of = Function(constant) 

kappa = 0.2 
alpha =20.0 

def upwind (b, u) : 

return [b [i] ('+')* (of ('+') *u( '+' ) + of ( ' - ' ) *u( ' - ' ) ) for i in range (len(b) )] 

a = dot(grad(v), mult(kappa, grad(u)) - mult(b, u))*dx \ 
+ dot(jump(v, n) , upwind(b, u))*dS \ 
+ dot (mult (v, n) , mult(b, of*u))*ds \ 

- kappa*dot(jump(v, n) , avg(grad(u)) )*dS \ 

- kappa*dot (avg(grad(v) ) , jump(u, n))*dS \ 

- kappa*dot (mult (v, n) , grad(u))*ds \ 

- kappa*dot (grad(v) , mult(u, n))*ds \ 

+ kappa*alpha/h( '+' ) *dot ( jump (v, n) , jump(u, n))*dS \ 
+ kappa*alpha/h*v*u*ds 

L = v*f*dx 



the interior facet which is being considered. When performing a computation, we 
compute this value in DOLFIN according to the definition in (5.6) and pass it to the 
form as a function. 

If a facet is an outflow facet relative to the element K + , i.e., b ■ n + > 0, then 
the value of of (' + ') is 1, while the value of of ('-') is and vice versa. As a 
consequence, the return value of the 'upwind' function is equal to bu* . The 'upwind' 
function is an example of how one can extend the FFC language with user-defined 
functions written in Python. 



5.3. The Stokes equations. Wc consider next the Stokes equations with a 
mixture of continuous and discontinuous functions, as well as basis functions with 
possibly varying polynomial orders. Consider the function spaces Vh and Qh, 



Vi 6 P k {K)MK e T, 1 < i < 




(5.7) 
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Table 5.3 

FFC input for the Stokes equation using k = 1 . 



V = VectorElement ( "Discontinuous Lagrange", "triangle", 1) 
Q = FiniteElement ( "Lagrange" , "triangle", 1) 

element = V + Q 

(v, q) = TestFunctions (element) 
(u, p) = TrialFunctions (element) 

f = Function(V) 

n = FacetNormalC'triangle") 

h = MeshSize( "triangle") 

alpha =4.0 

a = dot(grad(v) , grad(u) ) *dx + dot(v, grad(p))*dx - dot(grad(q), u)*dx \ 
+ dot(q('+'), jump(u, n))*dS \ 
+ q*dot(u, n)*ds \ 

- dot (mult (avg(grad(v) ) , n('+')). jump(u))*dS \ 

- dot(jump(v), mult (avg(grad(u) ) , n('+')))*dS \ 

- dot (mult (grad(v) , n) , u)*ds \ 

- dot(v, mult (grad(u) , n))*ds \ 

+ alpha/h('+')*dot(jump(v) , jump(u))*dS \ 
+ alpha/h*dot(v, u)*ds 

L = dot(v, f)*dx 



Setting V£ = V 2 = V \ x Qh and u = on dSl, particular bilinear and linear forms 
for the Stokes equation read [5] 

a(v,q;u,p)= / vVv : Vit dx + / v ■ Vp da; — / Vq ■ u dx 
Jn Jn Jn 

q\u ■ nj ds + qu ■ n ds 

r° Jon 



v\v\ : (Vu) ds - / v(Vv) : {uj ds - / v\v\ : Vtt ds 
n Jr° Jan 

f vVv : [u] ds+ / : M ds + / : \u\ ds, (5.9) 

Jon Jr° n Jan 11 

L(v,q) = I v - f dx. (5.10) 



and 



Jn 

The jump [■] and average (•) operators are defined as [«] = v + g) n + + v~ ® n , 
[v ■ nj = v + ■ n + + v~ ■ n~ and (Vw) = (Vi> + + Vv~)/2 on T° and [«] = v ® n 
on dn. The FFC input for this problem with k = j = 1, as proposed in [5], and the 
kinematic viscosity v = 1.0 is shown in Table 5.3. 

5.4. Biharmonic equation. Classically, Galerkin methods for the biharmonic 
equation seek approximate solutions in a subspace of H 2 (fl). However, such functions 
are difficult to construct in a finite element context. Based on discontinuous Galerkin 
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Table 5.4 

FFC input for the biharmonic equation using k = 4. 



element = FiniteElement ( "Lagrange" , "tetrahedron", 4) 

v = TestFunction(element) 
u = TrialFunction(element) 
f = Function(element) 

n = FacetNormaK "tetrahedron") 
h = MeshSize( "tetrahedron") 

alpha = 16.0 

a = dot (div(grad(v) ) , div(grad(u) ) ) *dx \ 

- dot(jump(grad(v) , n) , avg(div(grad(u) ) ) ) *dS \ 

- dot (avg(div(grad(v) ) ) , jump(grad(u) , n))*dS \ 

+ alpha/h( ' + ' ) *dot (jump(grad(v) , n) , jump(grad(u) , n))*dS 

L = v*f*dx 



principles, methods have been developed which utilise functions from H 1 (f2) [6, 22]. 
Rather than considering jumps in functions across element boundaries, terms in- 
volving the jump in the normal derivative across element boundaries are introduced. 
Unlike fully discontinuous approaches, this method does not involve double-degrees 
of freedom on element edges and therefore does not lead to the significant increase 
in the number of degrees of freedom relative to conventional methods. Consider the 
continuous function space 

V h = {ve Hi (0) : v 6 P fc (K) VK E T} . (5.11) 

Setting = = Vh, the bilinear and linear forms for the biharmonic equation, 
with the boundary conditions u = on <9f2 and V 2 it = on dfl, read 

a(v,u) = [ V 2 vV 2 u dx- [ fVv] ■ (V 2 u) ds - [ (V 2 w) • [VuJ ds 
Jn Jt° Jr° 

+ I ?IV«1 ■ IV u} ds, (5.12) 
J r o h 

L{v) = [ vf dx. (5.13) 
Jn 

The jump [■] and average (■) operators are defined as [Vu] = Vv + ■ n + + Vw~ • n~ 
and (V 2 u) = (V 2 v+ + V 2 w^)/2 on T°. The FFC input for this problem with k = 4 is 
shown in Table 5.4. 

For the biharmonic equation we consider an example on the domain f2 = [0,1] x 
[0,1] x [0,1] with / = 9-7T 4 sin(7ra;) sin(7ry) sin(7rz), in which case the exact solution 
u = sin(7rx) sin(7ry) sin(7rz). The observed convergence behaviour is illustrated in 
Figure 5.1 for various polynomial orders. As predicted by a priori estimates, a con- 
vergence rate of k + 1 is observed for k > 2 [6] , and a rate of k for polynomial order 
k = 2 [22]. The example demonstrates that the step from a two-dimensional problem 
to a three-dimensional problem is trivial when using the compiler. 
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Fig. 5.1. Error in the L 2 norm for the biharmonic equation with penalty parameters a — 4, 
a = 16 and a = 16, for k = 2, fc = 3 and fc = 4 respectively. 



element_u = FiniteElement (" Lagrange " , "triangle", 15) 

element_uh = FiniteElementC'Discontinuous Lagrange", "triangle", 5) 

u = Function(element_u) 
u_h = Function(element_uh) 

e = u - u_h 
M = e*e*dx 



5.5. Evaluating functionals. FFC can also be used to generate the required 
code for functionals (forms of rank zero), which is useful for computing the error when 
the exact solution is known or for evaluating various functionals of the computed so- 
lution. For a problem which has been solved using 5th order Lagrange basis functions, 
given the exact solution u and the finite element solution Uh, the error e = u — Uh 
and the FFC input for computing the L 2 norm of the error is shown in Table 5.5. 
The exact solution has been approximated by interpolating the exact solution using a 
continuous 15th order polynomial. The FFC input for computing the mesh-dependent 
semi-norm of the error 



is shown in Table 5.6. Here, T° = UdiT denotes the union of all interior facets of the 
mesh. 

6. Conclusions. An approach for automated code generation for discontinuous 
Galerkin forms has been presented. The concept is manifest in the form of a compiler 
which translates discontinuous Galerkin forms expressed in a high-level language into 
efficient low-level code. 

A special representation for element tensors for discontinuous Galerkin methods 



Table 5.5 
Computation of the error in the L 2 



norm (squared). 




(5.14) 
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Table 5.6 

Computation of the error in a mesh- dependent semi-norm (squared). 



element_u = FiniteElement ( "Lagrange" , "triangle", 15) 

element_uh = FiniteElement ( "Discontinuous Lagrange", "triangle", 5) 

u = Function (element_u) 
u_h = Function(element_uh) 

e = u - u_h 

M = dot(grad(e) , grad(e))*dx + dot(jump(e) , jump(e))*dS 



has also been presented. This representation involves a tensor contraction and per- 
mits the separation of terms which can be computed a priori and terms which are 
computed at run time. Such a representation can lead to improved performance rel- 
ative to conventional quadrature approaches for a variety of forms. However, the 
approach is not generally tractable by hand and necessitates automated code gen- 
eration techniques. The compiler for discontinuous Galerkin forms facilitates rapid 
implementation of new and existing approaches and reduces the time required for 
code testing through generality, while delivering efficient code and providing scope for 
various automated optimisations. 
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